Unsupervised Morpheme Discovery with Ungrade

نویسندگان

  • Bruno Golénia
  • Sebastian Spiegler
  • Peter A. Flach
چکیده

In this paper, we present an unsupervised algorithm for morpheme discovery called UNGRADE (UNsupervised GRAph DEcomposition). UNGRADE works in three steps and can be applied to languages whose words have the structure prefixes-stem-suffixes. In the first step, a stem is obtained for each word using a sliding window, such that the description length of the window is minimised. In the next step prefix and suffix sequences are sought using a morpheme graph. The last step consists in combining morphemes found in the previous steps. UNGRADE has been experimentally evaluated on 5 languages (English, German, Finnish, Turkish and Arabic) with encouraging results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UNGRADE: UNsupervised GRAph DEcomposition

This article presents an unsupervised algorithm for word decomposition called UNGRADE (UNsupervised GRAph DEcomposition) to segment any word list of any language. UNGRADE assumes that each word follows the structure prefixes, a stem and suffixes without giving a limit on the number of prefixes and suffixes. The UNGRADE’s algorithm works in three steps and is language independent. Firstly, a pse...

متن کامل

Unsupervised Morpheme Discovery with Allomorfessor

We describe Allomorfessor, which extends the unsupervised morpheme segmentation method Morfessor to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. The method discovers common base forms for allomorphs from an unannotated corpus by finding small modifications, called mutations, for them. Using Maximum a Posteriori estimation, the mode...

متن کامل

Morpho Challenge 2005-2010: Evaluations and Results

Morpho Challenge is an annual evaluation campaign for unsupervised morpheme analysis. In morpheme analysis, words are segmented into smaller meaningful units. This is an essential part in processing complex word forms in many large-scale natural language processing applications, such as speech recognition, information retrieval, and machine translation. The discovery of morphemes is particularl...

متن کامل

Morpho Challenge competition 2005-2010: Evaluations and results

Morpho Challenge is an annual evaluation campaign for unsupervised morpheme analysis. In morpheme analysis, words are segmented into smaller meaningful units. This is an essential part in processing complex word forms in many large-scale natural language processing applications, such as speech recognition, information retrieval, and machine translation. The discovery of morphemes is particularl...

متن کامل

Unsupervised Discovery of Persian Morphemes

This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian language through automatic analysis of corpora. We utilized a Minimum Description Length (MDL) based algorithm with some improvements and applied it to Persian corpus. Our improvements include enhancing the cost function usin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009